Goto

Collaborating Authors

 label space


e6d58fc68c0f3c36ae6e0e64478a69c0-Supplemental-Conference.pdf

Neural Information Processing Systems

It consists of an image encoder with a Vision Transformer [17] architecture, a text encoder with a similar Transformer architecture, and heads that predict bounding boxes and label scores from provided images and text queries. Input(s) An image and a list of free-text object descriptions (queries).



1cc70be9fb6a83bc46cf4ac21a91e0b0-Supplemental-Conference.pdf

Neural Information Processing Systems

Algorithm 1 Association Graph Learning (TRAININGTIME) Require: {Dtrt }Tt=1: Training sets of all tasks; T: Number of tasks; C: Number of all classes; E: Shared feature extractor; WT,WC: Parameters of metric functions in the association graph; L: Number of GNN layers; {Wl}Ll=1: Parameters of all GNN layers; {ft}Tt=1: Task-specific classifiers; ฮป: Learning rate. For clarity, we provide the algorithms during training and test in Algorithm 1 and Algorithm 2, respectively. Algorithm 2 Association Graph Learning (TESTTIME) Require: xt: one test instance from the t-th task; E: Trained the feature extractor; GT,GC: Trained task and class graph; L: Number of GNN layers; {Wl}Ll=1: Trained parameters of all GNN layers; ft: The trained task-specific classifier. In this section, we provide the class assignment of all datasets under different missing rates. Table B.1, B.2, B.3 shows the class assignment for Office-Home, Office-Caltechand ImageCLEF, respectively.


Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Neural Information Processing Systems

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a much more memory efficient training process,as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, mostimplementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classificationwith large output spaces, where memory-efficiency is paramount. With a label space of possibly millions of candidates,the classification layer alone will consume several gigabytes of memory. Switching from a dense to a fixed fan-in sparse layer updated with sparse evolutionary training (SET); however, severely hampers training convergence, especiallyat the largest label spaces. We find that the gradients fed back from the classifier into the text encoder make itmuch more difficult to learn good input representations, despite using a dense encoder.By employing an intermediate layer or adding an auxiliary training objective, we recover most of the generalisation performance of the dense model. Overall, we demonstrate the applicability of DST in a challenging domain, characterized by a highly skewed label distribution, that lies outside of DST's typical benchmark datasets, and enable end-to-end training with millions of labels on commodity hardware.


Evidential Mixture Machines: Deciphering Multi-Label Correlations for Active Learning Sensitivity

Neural Information Processing Systems

Multi-label active learning is a crucial yet challenging area in contemporary machine learning, often complicated by a large and sparse label space. This challenge is further exacerbated in active learning scenarios where labeling resources are constrained. Drawing inspiration from existing mixture of Bernoulli models, which efficiently compress the label space into a more manageable weight coefficient space by learning correlated Bernoulli components, we propose a novel model called Evidential Mixture Machines (EMM). Our model leverages mixture components derived from unsupervised learning in the label space and improves prediction accuracy by predicting weight coefficients following the evidential learning paradigm. These coefficients are aggregated as proxy pseudo counts to enhance component offset predictions. The evidential learning approach provides an uncertainty-aware connection between input features and the predicted coefficients and components. Additionally, our method combines evidential uncertainty with predicted label embedding covariances for active sample selection, creating a richer, multi-source uncertainty metric beyond traditional uncertainty scores. Experiments on synthetic datasets show the effectiveness of evidential uncertainty prediction and EMM's capability to capture label correlations through predicted components. Further testing on real-world datasets demonstrates improved performance compared to existing multi-label active learning methods.


Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

Neural Information Processing Systems

Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements.


Multiclass Transductive Online Learning

Neural Information Processing Systems

We consider the problem of multiclass transductive online learning when the number of labels can be unbounded. Previous works by Ben-David et al. [1997] and Hanneke et al. [2024] only consider the case of binary and finite label spaces respectively. The latter work determined that their techniques fail to extend to the case of unbounded label spaces, and they pose the question of characterizing the optimal mistake bound for unbounded label spaces. We answer this question, by showing that a new dimension, termed the Level-constrained Littlestone dimension, characterizes online learnability in this setting. Along the way, we show that the trichotomy of possible minimax rates established by Hanneke et al. [2024] for finite label spaces in the realizable setting continues to hold even when the label space is unbounded.



GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection Jinggang Chen

Neural Information Processing Systems

Consequently, we investigate how attribution gradients lead to uncertain explanation outcomes and introduce two forms of abnormalities for OOD detection: the zero-deflation abnormality and the channel-wise average abnormality.